Learning Top- k Transformation Rules
Identifieur interne : 000464 ( Main/Exploration ); précédent : 000463; suivant : 000465Learning Top- k Transformation Rules
Auteurs : Sunanda Patro [Australie] ; Wei Wang [Australie]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.
Abstract
Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.
Url:
DOI: 10.1007/978-3-642-23088-2_12
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001184
- to stream Istex, to step Curation: 001114
- to stream Istex, to step Checkpoint: 000121
- to stream Main, to step Merge: 000470
- to stream Main, to step Curation: 000464
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Learning Top- k Transformation Rules</title>
<author><name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
</author>
<author><name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-23088-2_12</idno>
<idno type="url">https://api.istex.fr/document/0E53B146AE762B16D3A5D89E42E870FCD55FC2D6/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001184</idno>
<idno type="wicri:Area/Istex/Curation">001114</idno>
<idno type="wicri:Area/Istex/Checkpoint">000121</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Patro S:learning:top:k</idno>
<idno type="wicri:Area/Main/Merge">000470</idno>
<idno type="wicri:Area/Main/Curation">000464</idno>
<idno type="wicri:Area/Main/Exploration">000464</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Learning Top- k Transformation Rules</title>
<author><name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<affiliation wicri:level="1"><country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Australie</country>
</affiliation>
</author>
<author><name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<affiliation wicri:level="1"><country xml:lang="fr">Australie</country>
<wicri:regionArea>University of New South Wales</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Australie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0E53B146AE762B16D3A5D89E42E870FCD55FC2D6</idno>
<idno type="DOI">10.1007/978-3-642-23088-2_12</idno>
<idno type="ChapterID">12</idno>
<idno type="ChapterID">Chap12</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.</div>
</front>
</TEI>
<affiliations><list><country><li>Australie</li>
</country>
</list>
<tree><country name="Australie"><noRegion><name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
</noRegion>
<name sortKey="Patro, Sunanda" sort="Patro, Sunanda" uniqKey="Patro S" first="Sunanda" last="Patro">Sunanda Patro</name>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
<name sortKey="Wang, Wei" sort="Wang, Wei" uniqKey="Wang W" first="Wei" last="Wang">Wei Wang</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000464 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000464 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:0E53B146AE762B16D3A5D89E42E870FCD55FC2D6 |texte= Learning Top- k Transformation Rules }}
This area was generated with Dilib version V0.6.32. |